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We investigate the state space reconstruction from multiple time series derived from continuous 
and discrete systems and propose a method for building embedding vectors progressively using 
information measure criteria regarding past, current and future states. The embedding scheme can 
be adapted for different purposes, such as mixed modelling, cross-prediction and Granger causality. 
In particular we apply this method in order to detect and evaluate information transfer in coupled 
systems. As a practical application, we investigate in records of scalp epileptic EEG the information 
flow across brain areas. 



I. INTRODUCTION 



o 

(N 

Since its publication, Takens' embedding theorem [l|, |2| is reputed as the most significant tool in non-linear time 
^SJ I series analysis and has been used in many different settings ranging from system characterization and approximation 
of invariant quantities to prediction and noise-filtering 3^. The embedding theorem implies that although the true 
dynamics of a system may not be known, equivalent dynamics can be obtained using time delays of a single time 
series, seen as the one-dimensional projection of the system trajectory. 



\^ • Most applications of the embedding theorem deal with univariate time series, but often measurements of more than 



one quantities related to the same dynamical system are available. One of the first uses of multivariate embedding was 
in the context of spatially distributed systems, where embedding vectors were formed from simultaneous measurements 
of a variable at different locations 0, Q . Multivariate embedding was used among others for noise reduction Q and 
in the the test for nonlinearity using multivariate surrogate data [7|. Another particular implementation was the 
prediction of a time series using local models on a state space reconstructed from a different time series of the same 
system [8| and with prediction criteria for the determination of the embedding vectors. In a later work, embedding 
was extended to all of the observed time series [3] with the goal again being the prediction of one time series. Finally, 
QVy ', multivariate embedding has been used widely for the estimation of coupling strength and direction in coupled systems 
C^ ' |10l4l3l |. where the embedding parameters were chosen either arbitrarily or separately for each time series. 
^D I Regarding state space reconstruction, in [IJ] independent components analysis was used to extrapolate from an 

r — . . embedding space created from multiple variables a projection with linear independent variables, bypassing thus the 
f^ ' problem of optimal parameters selection. In [15,] a modified false nearest neighbors approach was used for creating 
^D , the embedding vectors and was used to disentangle subspace dimensionalities in weakly coup led dynamical systems. 
Recently, non- uniform multivariate embedding using different delays was addressed in |16l . |l7[ , with ideas based again 
on false nearest neighbors. 

Most of these works are basically extensions of methods used in the scalar case without taking into account particular 
characteristics of multivariate embedding, such as the possible high dependence of individual components obtained 
^ . from the different time series. As in the univariate case, the optimal embedding vectors depend highly on the purpose 
of the reconstruction. 

In this work, we contemplate on the particularities of multivariate embedding and suggest a new embedding scheme 
to treat them. Based on Eraser's idea of using information criteria to select the embedding parameters in the univariate 
case [13, [3 ' "^^ propose a method for progressive building of the embedding vectors allowing for different variables 
and delays that is simple, intuitively sound and can easily be modified to handle most purposes of univariate and 
multivariate analysis. For bivariate time series, we also propose two causality measures based on the mixed embedding 
scheme. 

In Sec. ini the state space reconstruction from univariate and multivariate time series is briefly discussed, and in 
Sec. mil the proposed embedding scheme is presented together with a new method for the estimation of coupling 
strength and direction. In Sec. IIVI results from simulations on well known coupled systems are given, and in Sec. |V] 
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the proposed scheme is apphed to investigate information flow in the brain of epileptic patients. Finally, summary 
results are discussed in Sec. I VII 

II. STATE SPACE RECONSTRUCTION 

We suppose that a trajectory is generated by a dynamical system forming an attractor and we observe projections 
of the trajectory in one or more dimensions, i.e. we have measurements of a univariate or multivariate time series. The 
objective is to form a pseudo-state space in a way that as much information as possible about the original dynamics 
can be represented in the reconstructed points in this space. 

A. Univariate time series 

For a given time series {xn}n=ii the uniform embedding scheme is directly derived from Takens' embedding theorem 
[H as 

^n — V*^n ; -^n— Ti ■•■7 -^n— (m — 1)t J ■ \^) 

The two parameters of the delay embedding in ([T]) are the embedding dimension m, i.e. the number of components in 
x„, and the delay time r. According to Takens' theorem topological equivalence of the original and the reconstructed 
attractor can be established (under generic conditions) if tti > 2d + 1, where d is the fractal dimension of the original 
attractor. Among the approaches for the selection of m the most popular are the false nearest neighbors (FNN) [20| 
and the goodness-of-fit or prediction criteria [2l|, [231 . while r is usually chosen as the one that gives the first local 
minimum of the time-delayed mutual information [2j| or the first zero of the autocorrelation function [2J| . 

Judd and Mees [2q| introduced the non-uniform embedding scheme for univariate time series using unequal lags, 
so that x„ = (xn-ii,Xn-i2T ■ ■ ,Xn-i„i), and used it to obtain global reduced autoregressive models with the help 
of basis functions. The embedding vectors were constructed progressively by a minimum description length (MDL) 
criterion [2^. In [27| again MDL and model prediction error, and in [28| a false nearest neighbors statistic, were used 
to construct the embedding vectors. 

B. Multivariate time series 

Given that there arep time series {xi^n}^=l^ * = I7 • ■ • jPj generated through p different projections of the trajectory 
of the original dynamical system, the simplest extension of the uniform reconstructed state space vector in ([IJ is of 
the form 

^n ^ \X\^m 2;i.„— T, . . . , 2;i,ri— (mi — 1)tj ^2,tij ■ • • j ^^p.ri— (mp — 1)t j (.^j 

and is defined by an embedding dimension vector m — (mi, ...,mp) that indicates the number of components used 

p 
from each time series. The dimension of the reconstructed space in this case is Af = '^ rui. Also one can choose to 

use different t for each time series, defining so a time delay vector t — (ti, ...,Tp). 

C. Problems and restrictions of multivariate state space reconstruction 

A major problem in the multivariate case is the problem 0/ identification. There is often not a unique m that unfolds 
fully the attractor, even for a fixed M . For the univariate case, given a value for r and the minimum embedding 
dimension m — 2d+\, the embedding vectors are uniquely defined. Here, theoretically, all different combinations of 
rai that fulfill M ^ 2d+\ would fully unfold the attractor 0], provided that all the variables are from the same system. 
In practice things are not so simple. Suppose we have a method for comparing these reconstructions and selecting the 
optimal one. Due to finite data size and resolution one of these reconstructions would be deemed marginally better 
than the others. If we add small observational noise to the data and compare again the reconstructions, the one 
selected as optimum would most likely change. We elaborate more on this when we discuss the proposed embedding 
scheme below. 

An issue of concern is also the fact that multivariate data do not always have the same range, so that distances 
calculated on delay vectors may depend highly on the components of large ranges. So, it is often preferred to scale 



all the data to have either the same variance or be at the same data range %y\. In our study, we choose to scale the 
data to the range [0, 1]. 

A different problem of multivariate reconstruction is that of irrelevance, when time series that are not generated 
by the same dynamical system are included in the reconstruction procedure. This may be the case even when for two 
time series connected to each other through the observation function, only one of them is generated by the system 
under investigation. 

The opposite problem to irrelevance is that of redundancy. Often the observed variables may be synchronous or 
lagged correlated and this may add overlayed information in the embedding vector representation. In linear mode ling , 
the problem of redundancy has been addressed using appropriate order selection and regularization techniques |29l.[30|. 
and these tools have been used in local state space models for univariate time series prediction [3l|, [S^l . Nonlinear 
redundancy measures have been existing for long time (e.g. [1^, [SJl and references therein) and have been used in 
univariate time series, e.g. for the selection of optimal delay t. Still, when selecting r, so that two consecutive 
components of the embedding vector are not correlated to each other, this does not establish that non-consecutive 
components are also not correlated [SJ] • Thus the problem of redundant information should be treated collectively 
for all components of the embedding vector and not only for pairs of components, which constitutes a difficult task. 
The situation is even more difficult for multivariate time series, where redundancy should be investigated for both 
different and lagged variables. Our approach is aiming at treating this problem. 

III. PROPOSED STATE SPACE RECONSTRUCTION 

A. Non-uniform multivariate reconstruction 

Usage of the non-uniform scheme for multivariate embedding may reduce better the redundant information than 
the embedding with fixed lags. The most general embedding scheme allowing for mi varying lags lij, j = 1, . . . ,mi at 
each component variable Xi is 

^n — X'^l.n — lii ; '^l^n — li2 ; • ■ • i •^l.n — lim^ 7 *^2,n — /21 5 • ■ • ? "^P-n — lprn )• 

Inspection of all possible combinations of components for the determination of the optimum embedding (with any 
criterion we may choose) can be computationally intractable when the embedding dimension and the number of time 
series are large. For a moderate example when p — 3, M = S, and the maximum lag for all time series is 10 we have 
4050 cases, while when we increase M to 4 we have 27405 cases. It is evident that for most practical purposes we 
need a progressive method to build the embedding vectors. 

Assume that we have somehow chosen the maximum lags Li for each time series i — 1, . . . ,p. Then we have a 
collective set of the Li + L2 + ■ ■ ■ + Lp candidate components 

■t> \Xl_n: '^l^n—lj • • • j '^l,n — Lij -^2,71^ ■ • • ; -^p^n—Lp J 

and the embedding vector b to be selected is a subset of B. Using ideas developed for the state space reconstruction 
from univariate time series, the reconstructed vector must satisfy two properties. First, its components must be least 
dependent to each other, and secondly, it must be able to explain best the dynamics of the system, namely its future 
(or better our knowledge about its future) for one, two or several steps ahead. At time n the future of the system is 
represented by the vector 



Xf — (a;i,n-|-l, 2^1, n+2, ■ • ■ , Xi^n+Ti, X2,n+1, ■ ■ ■ iXp^n+Tp), 



for appropriate time horizon Ti for each variable Xi,i = 1, . . . ,p. 

The proposed scheme is progressive, and starting from an empty vector bo, suppose that at step j we have already 
selected the j — 1 components bj_i = {x\, xj, . . . , x*,^). Then we add the component x* G B \ {x\, . . . , x*,^] that 
fulfills the following maximization criterion 

max{/(xF; (a;j,bj_i))} , (3) 

Xj 

where lhi.p\ (a;j,bj_i)) is the mutual information between the variable x^ that represents the future of the system 
and the new embedding vector (a;j,bj_i). The maximum value /(x^?; (x*,bj_i)) regards the largest amount of 
information on x^ that can be gained by the augmented embedding vector, and thus bj = (x*,bj_i). 



As a stopping criterion for this scheme of progressive vector building we use a threshold for the change in the mutual 
information of x^ and the embedding vectors in the two last steps. We stop at step j and use bj_i as optimal if 

l{xF;hj^i)/l{xF;hj) > A, (4) 

where A is a threshold value, and A < 1. A threshold near 1, e.g. A = 0.95, allows the inclusion of a new component 
in the embedding vector even if the augmented vector explains very little of the information on yip that was not 
explained at the previous step. 

Depending on the form of B and xj? this approach can be used for univariate modelling when both B and Xp 
contain elements from the same time series, cross modelling when xp has elements from one time series and B from 
another, mixed modelling if x.p has elements from one time series and B from all time series and "full" modelling if 
x^ and B have elements from all time series. In this way, we give a purpose to the embedding vectors that we create. 
If one wants to check relations between the time series, cross modelling and mixed modelling should be preferred, 
while for the estimation of invariant quantities of the underlying system a full modelling would be better. 

The mutual information regards vector variables and thus the estimation is more difficult than for the standard 
delayed mutual information used to estimate the fixed lag r for univariate time series. Thus binning estimates of 
mutual information are not appropriate here as data demand increases dramatically with the dimension, at the level of 
millions of data points for large dimension. Since our embedding scheme depends heavily on the accurate estimation 
of information measures, the estimation method is of outmost importance. 

B. Mutual information estimation 

In [35| , the mutual information (MI) of two variables X and Y of dimensions dx and dy , respectively, was estimated 
from a sample of length N based on the well known formula 

I{X,Y)=H{X) + H{Y)-H{X,Y), (5) 

with entropies H estimated from X- nearest neighbor distances (for details, see |35|). 
First the joint entropy was estimated by 

d +d ^ 
HiX,Y) = -V(fc) + ^(N) + \og{cd^Cd^) + ""^ ^ J2^ogeii), (6) 

i=l 

where e(i) is twice the distance of the i-th sample point in the joint space {X, Y) to its K-ih neighbor, ipix) is the 

digamma function {ip{x) = ^logr(a;) — w^) and Cd is the volume of the d-dimensional unit cube, using maximum 

norm as the distance metric. In order to obtain the entropy H{X) the space of variable X is treated as a projection 
from the joint space and so 

. N ,7 ^ 

with nx{i) the number of points whose distance from the z-th point of X is less than e(i)/2, plus one. From Eq. ([5|)-([7|) 
and denoting by (...) the average over all i immediately follows 

I{X; Y) = i;(k) + ^{N) - (^K(i)] + ^[uyii)]). (8) 

Similarly to other estimates of MI, this one suffers from bias that depends heavily on the dimension of the vector 
variables. The setup of the embedding procedure allows us to use the conditional mutual information (CMI) instead 
that, as will be explained further below, gives smaller bias. Assume three time series X, Y and Z of length N with 
dimension dx, dy and dz- CMI is defined as 

I{X- Y\Z) = i{x- (y, Z)) - I{X- Z) = (9) 

= H{X,Z) + H{Y,Z)-H{Z)- H{X,Y,Z). 

Following the above method, we estimate the joint entropy H(X, Y, Z) using X-nearest neighbors and obtaining a 
formula similar to Eq. (jB]). Then the entropies H{X,Z), H{Y,Z) and H{Z) are estimated by projecting the space 



of {X,Y,Z) onto the respective subspaces of {X,Z), {Y,Z) and Z, obtaining formulas similar to Eq. ([7]). After 
substitution, we get the i^-nearest neighbor estimate for CMI as 

i{X; Y\Z) = ^{k) - (^K.(z)] + ^K-(^)] - ^K(i)]), (10) 

where Uxzi}) is the number of points whose distance from the i-th point of [X, Z) is less than e(i)/2, plus one, and 
the same for nyz{i). 

An alternative derivation of the expression of I{X; y | Z) is from the difference of the estimates of the two MI in 
Eq. (O. The first MI is estimated from Eq. dH) for the variables X and {¥, Z) as 

i{x- (y, z)) = ijj{k) + ij{N) - (7^K(i)] + i^inyzit)]). (11) 

Note that the second MI is not estimated independently, i.e. from Eq. ([8]) for the variables X and Z, but through 
the projection of the space of {X, Y, Z) onto the subspace of {X, Z). Using the definition of MI in Eq. ([S]) and the 
estimate of the projected entropy in Eq. (O, we get 

ip{X- Z) = 4>{N) - (VK(i)] + ^K(*)] - ^[n.Ai)])- (12) 

Our embedding scheme relies on the estimates in Eq. ([TOl) . Ec^. (fTTj) and Eq. (fT2| . Therefore we will examine the 
bias of these estimates. Possible factors for the bias of the MI estimate in Eq. (ITTT) and Eq. (IT^ are the number 
of available data N, the correlation structure for the variables denoted loosely as c and the dimension of the joint 
variable space d [35|. In our approach, rfy = 1 and the estimate I{X; {Y, Z)) of the real I{X; (Y, Z)) would be 

i{x- (r, z)) - i{x- (y, z)) + b{n, dx + dz + 1, c), (13) 

where B{N, dx + d^ + 1, c) is the unknown bias term. Assuming that I{X; Y \ Z) = I{X; {Y, Z)) — I{X\ Z) and i?i, 
B2 are the bias of the first and second term, the estimate for CMI is 

i{X-Y\Z) = IiX;iY,Z)) + Bi{N,dx + dz + l,c) 
-IiX;Z)-B^{N,dx + d,,c') 
= I{X;Y\Z)+BiiN,dx + d, + l,c) 

-B2{N,dx + d,,c') (14) 

Since in Eq. (fT4|) we have the difference of two bias terms we expect that if MI is systematically underestimated or 
overestimated, the CMI estimate will have reduced bias. 

The CMI estimate in Eq. PH)) truly has reduced bias, as a limited simulation study revealed. We tested three 
indicative cases for Gaussian multivariate time series of zero mean, unit variance and varying dimension and correlation 
structure. The multivariate time series regard the variable set {X,Z,Y) of dimensions {dx^dz^dy) respectively, and 
in each of the three cases we fix the dimensions of X and y , and vary the dimension of Z and the correlation structure 
between the variables. With reference to the embedding procedure, X corresponds to xf, Z to bj_i and Y to Xj 
(dy = 1 for all cases studied in accordance with the embedding procedure). 

For the first case, we choose the {X, Z, Y) time series from a 10-dimensional time series W with correlation matrix 



R 



1 r ■ ■ ■ r 
r \ ■ ■ ■ r 

r r ■ ■ ■ 1 



as follows. For dz = 1, we take the first 3 time series with correlation matrix given by the top-right 3x3 block of i?, 
denote the first one as X , the second as Z and the last as Y . For d^ = 2 we take the first 4 time series, denote the first 
again as X , the 2 next as the 2-dimensional Z and the last as Y and so on for dz ~ 3, . . . , 8. The cross-correlation 
is the same among all variables at a level r ranging from 0.2 to 0.9 with step 0.1. With this setup we want to study 
the effect of the change in dimension of Z (bj-i) to the MI and CMI estimation when the time series are correlated 
to each other at the same degree. 

In the second case, W has dimension 17 and R is as above but with size 17 x 17. Thus the only change is that X 
is 8-dimensional comprised of the first 8 time series of W , while dy = 1, d^ ranges again from 3 to 8, and r from 0.2 
to 0.9. In this way we study the same effect as before in the presence of a larger dimension for variable X (larger 
dimension for xj?). 



For the third case, W is lO-dimensional with correlation matrix 



R 



1 0.9 ••• 0.2 
0.9 1 •■• 0.3 

0.2 0.3 ••■ 1 



and the setup is as in the first case. Here we want to study the dimension effect when the correlation between the time 
series is varying. This simulation is more in context (although by no means identical) to the progressive procedure, 
where we would expect first the most relevant lag to be selected, then the second more relevant etc. 

The theoretical values for I{X; (Y, Z)) and I{X; Y \ Z) can be computed with the help of the expression of entropy 
for a multivariate Gaussian process X with unit variance J36| 

H{X)^^\og{{2nef\R\), 

with d the dimension of the variable X and \R\ the determinant of the correlation matrix R oi X. We estimate the 
average MI and CMI of 1000 simulations for each case (and different dz and r) and compute the mean of the bias 
terms for I{X; {Y,Zj) and I{X;Y \ Z), given as < / > —/theoretical- In Fig- [2 we give the bias for all cases. We 




FIG. 1: (Color online) Bias of mutual infomiation with black solid line and conditional mutual information with gray dashed 
line (cyan on line) for Gaussian time series with (a) dx = dy = 1 and dz = 1,...,8, A^ = 4096, (b) dx = dy = 1 and 
dz = I, ■ ■ ■ ,8, N — 8192, (c) dx ~ 8,dY ~ 1 and dz = 1, ■ ■ ■ ,8, N = 4096. Lines correspond to different correlation coefficient 
r — 0.2, . . . , 0.9 with step 0.1. In (d) we have the case of varying correlation coefficients with dx = dy = I and dz ~ 1, ■ ■ ■ ,8, 
N = 4096. 

see that the bias of CMI is constantly close to zero, while for MI it is large and can be both negative and positive 
depending on the dimension and correlation of the variables. Figures [IJa and b) show that as sample size increases 
MI bias barely reduces. In Fig. [TJc) where dx = 8 the bias is much larger for MI, and in Fig. [IJd) the case of varying 
correlation coefficients shows the same results. The number of neighbors used for the estimation of I{X; (Y, Z)) and 
I{X;Y I Z) in this example is set to 10 as in all the Monte Carlo simulations in our study. This choice was decided 
after a pilot study on a wide range of K values (from 5 to 30) showed that though the mutual information values 
changed with K the embedding vector forms were stable. We note that for a small time series length, e.g. N — 512, 
a smaller value of K may be more appropriate but for consistency purposes we use K = 10 for all N. 

The analysis above shows that CMI is a better information measure to use in our embedding scheme than MI. 
Denoting X = xp, Y = Xj and Z = bj_i, from Eq. ^ CMI is related to MI as 

I{xf;xj I bj_i) = /(xF;(2:j,bj_i))-/(xF;bj_i). (15) 

Thus instead of using /(x^; (x^, bj_i)) in the information criterion for embedding of Eq. ([3]), we can use I(xp;xj \ 



bj_i). We refer to the two criteria as 

lo : max{/(x_F; (xj,bj_i))} , 

Xj 

Ii : max {/(xi?; Xj |bj_i)}. 

Xj 

Theoretically, both criteria are equivalent as the term /(xi?;bj_i) in Eq. (jisp is constant at step j with respect to 
Xj. However, in practice this is not true, since the estimation of I(xp] bj_i) is done by the formula in Eq.(|12|) that 
depends on Xj. Due to the lower bias in the estimation of CMI, as shown above, we expect Ii criterion to perform 
better. 

Another serious effect of the bias in the estimation of MI is on the stopping criterion of the proposed state space 
reconstruction in Eq. (j4]) (common for both Iq and Ii criteria). Let us suppose that the two terms /(xi^;bj_i) and 
l(xp;hjj in the stopping criterion are estimated independently. The difference in dimension by one of bj_i and 
bj regards a different bias in the two MI terms. The correlation structure may also contribute to the bias but at a 
lesser extent as it does not change a lot by the inclusion of a single component. The bias can even be negative, and 
moreover, the bias for /(xi?;bj) may be larger in magnitude than for /(x^?; bj_i), so that l(xp;hj-i) > l(xp;hj^ 
may occur, which cannot happen in theory. This theoretically impossible result was observed in simulations. 

In order to balance the bias due to dimension change and correlation structure we estimate l(x.p;hj^ij using a 
projection of the space regarding l(x.p; (a;j,b_,_i)), as given in Eq. p2)) . As we have seen, CMI has near zero bias, and 
thus the bias of Ip(xp; b^-^i) should be approximately equal to the bias of /(x^?; (xj, bj_i)) (estimated by Eq. (fTTj) ). 
making the two terms better comparable in the stopping criterion. 

To demonstrate the two non-uniform multivariate embedding schemes we give an example for the x and z variables 
of the Lorcnz system [33] 

x{t) = 10{y{t) - x{t)) 

y{t) = x{t){2^ - z{t)) ~ y{t) 

z{t) = x(i)y(i)-8/3z(i), 

contaminated with observational Gausian white noise added to each variable with a standard deviation 10% of the 
standard deviation of the variable. The time series length is 10000 and the sampling time is t^ = 0.05 sec. We are 
interested in predicting the x variable and set "Kp = {xn+i, ■ ■ ■ ,Xn+5)- We set L^ = L^ ^ 25 and use the value 
A = 0.95 for the stopping threshold value. 

For the given threshold, the obtained embedding vectors for the Iq criterion have the form {xn,Zn-i, Zn-io), while 
Ii gives (x„, z„_i, 2;„_ii). In Fig.[2l the values of the respective MI and CMI are given for the first 4 embedding cycles 
(an iteration that results in the addition of a component in the embedding vector), along with the selected component 
at each cycle. 

The first embedding cycle is identical for both Iq and Ii as the conditioning vector is empty, and the forms of 
MI and CMI are very similar at the second embedding cycle. They differ a bit at the third embedding cycle, which 
leads to the selection of different components (z„_io for Iq and z„-ii for Ii), and consequently the fourth embedding 
cycle gives completely different components. In Fig. [2][e) we give the MI and CMI for 10 iterations of the embedding 
procedure. 

MI increases to a maximum value with the embedding cycles and then decreases, only slightly, due to the bias. 
CMI decreases gradually and monotonically to zero, which means that the new components contribute gradually less. 

C. Estimation of information transfer using non-uniform multivariate embedding 

Non-uniform multivariate embedding can be used for measuring information flow between coupled systems. The 
main approach for estimating information transfer is based on the idea of Granger causality [38]. Simply put, having 
measurements from x and y, a causal relationship from a time series x to another y, can be identified if the prediction 
of y gets better by including also values of x. One way to implement this idea is to find the best model for predicting 
y from its previous values and the best model for predicting y from previous values of y and x, and compare the two 
predictions. If the mixed prediction that uses both y and x is better, then there is causal relationship from x to y. 

An interesting fact is that the idea of Granger causality contradicts Taken's embedding theorem, according to which 
for explaining the dynamics of y (and consequently predicting its values) a time delay reconstruction from components 
of y is sufficient, and therefore x values are not needed. As a matter of fact, in [l^] a measure related to Granger 
causality was used in a case of strongly unidirectionally coupled Henon maps and the causality relation could not be 
detected. This was because both the best univariate model and the best mixed model gave approximately equally 




FIG. 2: (Color online) (a)-(d) The first four embedding cycles of lo for the reconstruction aiming to explain Lorenz's x variable 
from X and z variables. Each panel shows the mutual information for all candidate components at each iteration, where the solid 
line represents lags of x, the dotted line represents lags of z and the selected lag is highlighted. The selected lag in panel (d) 
is not included in the embedding vector because the ratio exceeds the threshold value. Panels (f)-(h) show the corresponding 
to (b)-(d) embedding cycles using Ii and showing the conditional mutual information. The first embedding cycle is omitted 
because it is identical to (a) . Panel (e) has the values of mutual information (criterion lo - black line) and conditional mutual 
information (criterion Ii - grey line, cyan online) for 10 embedding cycles, with the left and right vertical axes corresponding 
to the values of MI and CMI, respectively. 

good predictions. Our approach of detecting causality aims at rendering this problem. Simply, if the embedding 
procedure for prediction of y results in embedding vectors that contain components of x, this means that information 
from X is transferred to y. 

Depending on the coupling strength and the nature of the variables, some adjustment of the parameters may be 
needed. If the coupling strength is quite small the stopping criterion threshold may need to be increased to values 
over 0.95, and in the case of flows or time series with different complexity, different time horizons spanned by x^? 
should be tested. 

Further, we introduce two ways to quantify the coupling strength if the embedding procedure results in a state 
space of mixed components, i.e. from x and y. Suppose that the embedding vector aiming to explain x^^ that contains 
only future values of y (e.g. x^? = (yn+i)) has the form 



Xn — \Xn—lii : Xn — li2 7 • ■ • j ■^n—lim-^ i 2/n — (21 ) • ■ • : J/n— '27712 -'■ 

We estimate one-step ahead prediction with local model on these embedding vectors. For each target point in the set 
of all reconstructed points, the prediction is simply the mapping of its nearest neighbor found from all other points 



in the set, and we compute the normalized root mean square error (NRMSE) E. 



E^ 



\ 



n 

Y.ivn+i-y)'^ 



where ijn+i is the in-sample one step ahead prediction at time n and y is the sample mean. Then we consider the 
reduced embedding vectors using components only from x or y 

We compute the E of local model fit of yn+i using the projected embedding vectors x^^, E.^2, and obtain the measure 
of coupling from a; to y as 

Sx^Y = l-^- (16) 

If X drives y the omittance of components of x (the use of x^j instead of x„), would increase E, so that i?x2 > E^, 
and thus S would be positive. If there is no coupling, then components of x are not useful and either they are not 
included in x„ (x„=x^) or they do not contribute significantly in predicting y, so that E.^2 < E-^. In any case, the 
result is that S is not significant. In our study we varied the prediction to T-step ahead depending on the system and 
form of Tip . 

We propose a similar measure to S replacing E with mutual information, giving 

/(xf;x^) /(xf;xi | x^) 
nx^Y — i w s- — -r, s ■ {>-<) 

/(xf;x) /(xf;x) 

Rx^Y measures the information of y that is explained by components of the embedding vectors that emanate from 
x, normalized against the total mutual information in order to give a value between and 1. The measures for the 
opposite direction are defined accordingly. 

IV. SIMULATIONS AND RESULTS 

A. Driven Henon 

We evaluate the embedding vectors derived by criteria Iq and Ii and test our embedding scheme for coupling 
strength estimation on some coupled systems. First is the driver-response Henon system [39| given by 

Xn+i = 1.4 - x^ -|-0.3x„_i 

2/n+i - 1-4 - {CynXn + (1 " C)yl) + 0.3y„_i 

The time series length is 4096 and the coupling strength takes values C=[0, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6]. The 
non-uniform multivariate embedding procedure is applied on 100 realizations for each C and the embedding vector 
that has the largest frequency of occurrence is selected. We set L^ — Ly = 5, to account for all significant delays, and 
xp = (xn+i) for predicting x and x.p = (yn+i) for y, as one step ahead is sufficient to detect causal effects (e.g. see 
[13|V We let the stopping criterion threshold A = 0.95 as for maps small variations of A do not alter the form of the 
embedding vectors. 

Across the 100 realizations for each C, a single embedding vector form was found in both methods (but not 
necessarily the same for the two methods). For the prediction of x both Iq and Ii produced the vector (x„,x„_i) for 
all C, while for the prediction of y the two methods differed slightly: method Iq produced (y„, 2/n-i) for C = 0, 0.05, 0.1, 
(x„_i, j/„,2/„_i) for C — 0.2,0.3,0.4 and (x„,a;„_i,2/„, yn-i) for C — 0.5, 0.6, while method Ii differed only for C — 0.1 
selecting the form {xn-i,yn,yn-i), i-e., detecting the correct driving also for weaker coupling strength. 

On the same 100 realizations the selected embedding vectors are used for the estimation of average values for £^x, 
E^2 , as well as the Sx^y and Rx-+y coupling measures and are shown in Fig. [3l The results on the measures are in 
agreement with the form of the embedding vectors. The monotonic rise of the Sx-fY curve shows that as C increases 
the contribution of x in predicting y also increases. The use of criterion Ii shifts the detection of coupling to smaller 
C. The results for the Rx^y measure for both criteria are similar to those for Sx^y- 
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FIG. 3: (Color online) (a) Sx (solid line) and E^2 (dashed line) for the driven Henon y across all C, where embedding vectors 
are suggested by lo in the upper panel and Ii in the lower, (b) Sx^y and Rx^y for the same system across all C and for 
both criteria (solid line for lo and dashed for Ii, with the left vertical axis corresponding to values of 5* and the right of R). (c) 
The same as (b) for unidirectionally coupled Henon maps with 20% noise. 

TABLE I: Embedding vectors and their frequency of occurrence for the system of unidirectionally coupled Henon maps with 
20% noise. 



c 




I() {x,y) ^ X 


lo {x, y)^y 




Ii {x,y) -^ X 


Ii {x,y) -^ y 





{Xn 


Xn-l,Xn-2) 


100 


{yn,y„-i,yn~2) 


97 


{Xn 


a;n-i, Xn-2) 


92 


{yn,yn-l,yn-2) 91 


0.05 


{Xn 


Xn-l,X„-2) 


100 


{yn,yn-i,yn~2) 


97 


(Xn 


Xn-1, Xn-2) 


92 


{yn,yn-i,yn-2) 80 


0.1 


{Xn 


X„^l,X„-2) 


100 


(yn,yn-i,yr,^2) 

(yn,yn-l) 


53 
22 


{Xn 


Xn-1, Xn-2) 


92 


(Xn-l,yn,yn-l,yn-2) 42 

{yn,y„-i,yn-2) 31 


0.2 


(Xn 


X„-l,X„-2) 


100 


(a;„_i,j/„,2/„_i,y„_2) 
(a;„_i,i/„,y„_i) 


52 
33 


{x„ 


Xn-l,X„-2) 


92 


{Xn-l,yn,yn-l,yn-2) 45 
{Xn-l,yn,yn-l) 28 


0.3 


{x„ 


X„-l,X„-2) 


100 


{Xn,yn,yn-\) 

ix„-i,y„,y„-i) 


55 
19 


(x„ 


Xn-1, Xn-2) 


92 


{x„,Xn-2,yn,yn-l) 47 

{x„,yn,yn-i) 27 


0.4 


{x„ 


X„-l,X„-2) 


100 


(a;„,x„_2,y„,y„_i) 
{xn,y„,y„-i,yn-2) 


34 

27 


{x„ 


Xn-1, Xn-2) 


92 


{Xn,Xn-2,yn,yn-l) 79 
{Xn,yn,yn-l,yn-2) W 


0.5 


(Xn 


X„-l,X„-2) 


100 


ix„,yn,yn-i) 

{x„,X„-2,yn,yn-l) 


52 
19 


{Xn 


Xn-1, Xn-2) 


92 


{Xn,Xn-2,y„,yn-l) 53 
{Xn,yn,yn-l,yn-2) 30 


0.6 


{Xn 


Xn-l,X„-2) 


100 


[Xn, y-n, yn — 1 


89 


{x„ 


Xn-1, Xn-2) 


92 


{xn,yn,yn-i) 85 



We add 20% observational noise to each variable {x and y) and repeat the analysis. Table H] shows the selected 
embedding vectors and their frequency. When the frequency of the most frequently selected vector is small, also the 
second most frequently selected vector is shown. Careful examination of the second most frequently selected vector 
form shows that these vectors are either the most selected vector forms with omittance of a component, or with 
replacement of a component. The omittance is due to the threshold value that may be strict and in conjunction with 
the presence of noise at cases does not allow for another component inclusion. The replacement is due to almost 
equivalent information in the regarded components, so that due to noise perturbation either of them is selected 
"randomly". Figure [3jc) shows Sx^y and Rx^y calculated for the most frequently selected embedding vectors. 

The presence of noise changes the embedding vectors and decreases Sx-^y values, but the information transfer 
results are similar to the noise- free case. As shown in Fig. [3Jc), Rx-^y works also better for criterion Ii than for Iq, 
where for the latter Rx^y does not increase monotonically with C . 



B. Bidirectionally coupled Henon maps 

The next system is for two bidirectionally coupled identical Henon maps [40| given by the equations 



yrn 



= 1-4 -a;2 
= 1-4 -y^ 



-0.3.T„_i+C2(x2 -y2) 

0.3y„_i + Ci (y2 - 4) 



■0.102 


-0.104 


0.055 


0.056 


0.322 


0.328 


0.148 


0.148 


■0.151 


0.407 


0.055 


0.158 


■0.140 


0.601 


0.061 


0.227 


■0.080 


0.639 


0.066 


0.268 
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Ci is the strength of x driving y and C2 the opposite. We apply only Ii criterion for the embedding vector selection 
and set L^ = Ly = 5, N — 4096, Xp — (y„+i) for y and Xp = (x„+i) for x. The results for some specific couphng 
strength values are given in the second and third column of Table HIl For equal coupling strengths, there is exact 

TABLE II: Embedding vectors and their frequency of occurrence for bidirectionally coupled Henon maps using criterion Ii, 
along with S and R values for the most frequently selected vectors. 

[Ci,C2] Ii {x,y) -> X Ii {x,y) -^ y Sy^x Sx^y Ry^x Rx^y 

[0.05,0.05] {xn,Xn-i,y„) 97 (y„,j/„_i,a;„) 99 

[0.1,0.1] {x„,x„-i,y„) 52 {y„,y„-i,Xn) 4:9 

{x„,x„-2,yn) 37 {y„,yn-2,x„) 38 

[0.1,0.05] (x„,x„-i,y„) 97 {y„,yn-i,Xn) 100 

[0.15, 0.05] (a;„,a;„-i,y„) 100 (2/n, J/n-i, x„) 100 

[0.2,0.05] (a;„,a;„_i,y„) 100 {y„,y„-i,Xn,Xn-i) 54 

{yn,yn~l,Xn,Xn-2) 16 

interchange of x and y in the embedding vectors. For the last case with Ci = 0.2 and C2 = 0.05 we observe that when 
predicting x the embedding vectors are smaller than when predicting j/, as we would expect. The fact that for all 
cases, except the last one, the embedding vectors have the same form, does not allow us to derive conclusions about 
the individual coupling strengths and we turn to quantitative results of S and R measures. 

The negative values of Sy-^x (see Table |lll show that the coupling cannot be detected by S when it is very weak 
(C2 = 0.05) and the same holds for Sx^v when Ci — 0.05. When there is substantial coupling of equal strength in 
both directions (Ci = C2 = 0.1), Sx^y and Sy^x take equally large values. When coupling increases only in one 
direction, as for C2 = 0.05 and Ci = 0.1, 015, 0.2, the respective Sx^y measure increases as well. For weak coupling, 
R performs better than S giving values slightly over zero and capturing the weak relationship between the variables 
(see Table HI]) . As coupling increases so does the values of Rx-^y and the difference between Rx^y and Ry^x shows 
the direction of the strongest coupling. 

C. Unidirectionally coupled Rossler— Lorenz 

The third system we study is the coupled Rossler-Lorenz system 1411] given by 

xi{t) = 6(-yi(i)-z(t)i) 

yi{t) = 6{xi{t)+0.2yi{t)) 

ii(t) = 6(0.2 + xi(i)zi(i) -5.7zi(i)) 

X2{t) = 10(y2(t)-X2(i)) 

2/2 (t) = 28x2{t)-y2{t)-X2{t)z2{t) + Cyiitf 

Z2{t) = -8/3z2{t)+X2{t)y2{t). 

We use the j/i variable of the Rossler system as the driving time series, which we denote x, and 2/2 of the Lorenz system 
for the response time series, which we denote as y. The coupling strength takes values C = [0, 0.5, 1, 1.5, 2, 3, 4]. We 
take Lx — Ly ~ 15 to account for all significant delays. 

First, we set N = 4096 and apply the embedding procedure for x^? = (a;„+i,x„+2, a^n-fs) for x and xp — 
{yn+i,yn+2,yn+3) for J/, and let A — 0.95 (discussion on the choice of xp and A follows below). The most fre- 
quently selected vectors from 100 realizations are given in Table IIIII Again there are no embedding vectors for 
predicting x that have components of y. We observe that Iq fails to detect the information transfer from x to y for 
small values of C and works well only for C > 3. On the other hand, Ii detects the contribution of a; for C > 0.5. 

We estimate the average values of Sx^y and Rx^y on the 100 realizations for the most frequently observed 
embedding vectors, and the results are shown in Figure IH^a). In accordance to the form of 'x.p we modify the 
prediction models to make 3-step ahead prediction instead of 1-step. The contribution of x to the prediction of y is 
rather high for large C. Again criterion Ii works better than Iq. Sx^y increases almost monotonically, and only 
for C = 4 has a slight decrease. Rx-^y shows similar behavior, but starts decreasing from C — Z. For this system, 
generalized synchronization occurs for C > 3 [41| and this may be the reason for the decrease of Rx^y- Under 



12 



TABLE III: Embedding vectors and their frequency of occurrence for the unidirectionally coupled Rossler-Lorenz systems. 



C 



lo {x,y) -^ X 



lo {x,y) -S- J/ 



Ii {x,y) -^ X 



Ii {x,y) 



> y 

61 
35 

58 
21 
72 
17 
56 
13 
88 
72 
27 
90 






{xn,xn-e) 100 


0.5 


{Xn,Xn-6) 100 


1 


{Xn,Xn-(i) 100 


1.5 


{x„,X„-6) 99 


2 


(a;„,a;„_6) 100 


3 


{Xn,Xn-(i) 100 



(a;n,a:n-6) 100 



(2/n,J/n-2) 100 

{yn,yn-i,yn~2) 56 

(y„,i/„_i)35 

{yn,yn-i,yn-2) 56 

{yu,yn-i) 44 

{y„,y„-i,y„-2) 100 

(yn,yn~i,yn-2) 99 
{x„,x„-g,yn,yn-2) 86 

{Xn,Xn-S,yn,yn-2) 99 



a;„,a;„_6) 100 

a;„,a;„_8) 100 

a;„,a;„_6) 100 

X„,X„_6) 100 

a;„,a;„_6) 100 

a;„,a;„_6) 100 

(Xn,Xn_6) 100 



{yn,yn~2,yn~12) 

(yn,yn-2,J/n-13) 

(yn,yn-2,yn-13) 

{Xn-9,yn,yn-2,yn-13) 

{x„-S,yn,yn-2,yn-6) 

{X„, X„-S, yn,y-n.-2, yn-d) 

(Xn-S, Xn-12,y-n, ^71-2) 

(a;„_8, a;„_i2, yn, j/n-2, yn-s) 

{Xn,Xn-S,y,i,yn-2) 
{Xn,Xn-S,yn,yn-2) 

ix„,yn,yn-2) 

{x„-9,yn,yn-2) 
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FIG. 4: (Color online) As Fig. |3jb) and (c) but for the coupled Rossler-Lorenz system. 



generalized synchronization there exists a function (/> such that y = (j^i^) i^ ^^^ V ^^e functionally related) which 
means that x^ and x^ contain a lot of common information and thus /(xi?;x^ | x^) decreases. 

The best embedding dimension for both Rossler and Lorenz systems is known to be 3 when no coupling is present. 
Thus for the unidirectional coupling from a; to y we should estimate three dimensional vectors comprised only of 
components of x when predicting x. Both criteria underestimate the embedding dimension in this case (see columns 
two and four of Table llll| due to the strictness of the threshold [A = 0.95) or the choice of Xi?. For example, using 
A = 0.99 and Ii criterion the embedding vector contains the component a::„_io in addition to the components x„ and 

•^n — 6 ■ 

Our simulations have shown that a larger A allows for more iterations of the embedding scheme and then more 
components from the driving time series are likely to enter the final embedding form. For the Rossler-Lorenz system, 
even for very small C, the driving effect is detected by the embedding vector form for larger A, and subsequently the 
directed coupling measures S and R get positive. In Figure [51 we show the graph of Rx^y vs A for the same :>cp as 
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above, the Ii criterion, and for C = 0.5 and C = 1. Note that the departure of Rx^y from the zero level is succeeded 

(a) 




FIG. 5: (Color online) Rx~*y vs A for the coupled Rossler-Lorenz system using the Ii criterion and for different time series 
length A'^, as shown in the legend. For each A'^, the dark line is the median and the two gray (cyan on line) lines are the 2.5% 
and 97.5% percentiles. The coupling strength is C = 0.5 in (a) and C = 1 in (b). 



for smaller A as the time series length increases, e.g. for C — 0.5 the coupling can be detected {Rx^y being positive) 
for A > 0.95 when N = 8192 and A > 0.97 when N = 2048, whereas for C = 1 the detection is succeeded for the 
whole examined range of A and for both N. Also, for small time series, as for N = 512 in Figure [5l the very weak 
coupling can still be detected but for larger A and with a large variability of the Rx^y measure. For such small time 
series, a large A does not result in a consistent embedding form as small changes in the data (at each realization) may 
give different embedding vectors. This may cause that components from one time series may enter the embedding 
form even if they do not really explain the evolution of another time series (the case of uncoupled systems). However, 
for the Rossler-Lorenz system, even for very large A {A — 0.99), the embedding scheme did not give rise for spurious 
causality, i.e. only for N = 512 the R measure produced positive values for uncoupled systems and for larger N the 
R measure was always zero for the uncoupled systems and for the direction of no coupling. 

Regarding the future vector Xi?, we investigate the effect of the future time horizon T on the embedding vector 
form. We repeat the simulations for Ii setting A = 0.95 and T = 1, . . . , 7, where the size of xf increases with T. 
For the uncoupled Rossler-Lorenz system, the embedding vectors do not have mixed components for either direction 
regardless of T, similarly to the embedding vector forms shown for T — 3 (Table Hill row for C = 0). For very weak 
coupling, the embedding vector forms vary more when T is larger. The high dimension of the joint vector (due to 
a large T) makes the estimation of entropy terms less accurate, and in the presence of small coupling, different but 
close delays are selected as components of the vector form at each run. Besides the differences in the delays of the 
components there are always components from the driving system, e.g. components of x are found when explaining 
y from both x and y in all the realizations for C — 0.5 when T > 4. We note also that the number of components 
of X and y in the embedding vector at each C are the same for T > 4. Moreover, the driving effect can be better 
detected when T is larger, as shown for the Sx^y and Rx^y measures in Figure [H For T — 1, Rx^y finds no 
driving effect X ^ Y unless C > 1.5, whereas for T = 2, 3 it detects driving effect also for C = 1 and for T > 4 also 
for C = 0.5. The measure Sx^y behaves similarly but gives negative values instead of positive when T = 1, 2 and C 
is small, which means that the prediction is actually better when subtracting the x components from the embedding 
vector form. Thus though there is coupling for C = 1, 1.5, the use of y components alone is sufficient to predict y a 
couple of steps ahead (T = 1,2). This is a drawback of the local prediction method in S that uses only the closest 
neighboring point. 

Next we investigate the effect of noise on the embedding schemes. In Table IIV| we show the embedding vector 
forms and their frequencies for the embedding schemes Iq and Ii when 20% observational noise is added to each 



14 



(a) 



1 

0.8 

0.6 

0.4 

M" 0.2 



-0.2 



T 



-0.4, 



/ — T=1 . 

/ ---T=2 

/ --T=3 

/ T=5 

/ — T=7 







(14 




. 






— T=1 




---T=2 


/ \^ 




0.3 


--T=3 
T=5 


// '■■■■■ 


^N-_____^ 




1=7 


/P 




0.2 


// 


0.1 


" '"^'-^ 


°C 


'■■'' / 


/ 




) 


1 2 


3 4 



FIG. 6: (Color online) (a) Sx^y vs C for the coupled Rossler-Lorenz system using the Ii criterion and for different time 
horizons T, as shown in the legend, (b) The same as (a) but for Rx^y- 

variable, fixing the other parameters {A = 0.95, T — 3, N — 4096). The presence of noise changes the results on the 

TABLE IV: Embedding vectors and their frequency of occurrence for the unidirectionally coupled Rossler-Lorenz systems with 
noise. 



c 


lo (x,y) -^ X 


lo {x,y) -> y 


Ii {x,y) ->■ X 


Ii {x,y)- 


■^y 





{x„,X„-6,Xn->>) 77 


{yn,yn-i,yn-3,yn-7 


22 


yXfi, Xn — Qf Xn — 9) 


69 


{y„,y„-i,yn-3,yn-i3) 


17 




(a;„,a;„_6,2;„_g) 7 


(y„,j/„_i,y„_2,yn~6 


12 


(Xn,Xn-6,^n-10) 


15 


{yn, y„-i,y„-3, yn-io, j/n-is) 


15 


0.5 


{Xn,Xn-&,Xn-<i) 81 


{■y„,y„-2,yn-i 


34 


yXn, Xn — 6f Xn — 9) 


81 


iyn,yn-i,yn-3,yn-7) 


36 






{y„,y„-2,yn-5 


23 






(y„,y„-2,yn-5) 


16 


1 


(a;„,a;„_6,a;„_9) 86 


iyn,yn-2,yn-i 


42 


yXn-i Xn — Qf Xn — QJ 


82 


{X„-S, X„-12,yn,yn-2, yn-i) 


12 






{yn,yn-2,yn-3 


31 






{Xn-S, Xn-12,yn,yn-2,yn-t 


)9 


L5 


{Xn,Xn-6,X„-<>) 81 


{Xn-S,yn,yn-2,yn-4 


46 


yXn, Xn — 6'i Xn — 9) 


79 


(x„_8, Xn-12, l/n, J/n-2, J/n-s) 


27 






{x„-S,yn,yn-2,yn-3 


24 


{Xn,Xn-6,Xn-lo) 


10 


(a;„_8, x„-i_3, yn,y„-2, yn-s) 


17 


2 


(a;„,a;„_6,a;„_g) 78 


{x„-s,x„-i2,y„,y„-2 


20 


yXn-i Xn — Qf Xn — 9) 


78 


{X„-S, Xn-12, yn,yn~2, yn-?) 


11 




{Xn, X„-(j, X„-s) 8 


(x„_8,X„_ll, y„, yn-2 


14 


{Xn, Xn-6, Xn-io) 


10 


{X„, X„-7, yn,yn-2, yn-b) 


11 


3 


{Xn,Xn-e,Xn-9) 69 


{x„,X„-4,yn,yn-2 


42 


{Xn,Xn-6,yn-5) 


78 


{Xn, Xn-A, J/n , y„-2, J/n-14) 


17 




{x„, x„-(j, x„-s) 7 


{x„,Xn-5,yn,yn-2 


20 


yXn^ Xn — 6^ Xn — 9 


)8 


{Xn,Xn-5,yn,y„-2) 


13 


4 


(a;„,x„_6,a;„_g) 69 


{x„,Xn-e>,yn,yn-2 


21 


{X„, X„-fi,Xn-9) 


29 


(Xn-9, X„-I3,y„, yn-2) 


20 




yXn^ Xn — &^ Xn — io) > 


{Xn-2,Xn-g,y„,yn-2 


11 


{x„,x„-6,y„-6,yn-s) 


14 


(Xn, Xn-7, y„, J/„-2, yn-s) 


10 



noise-free system. More forms of embedding vectors are observed mostly because a component is replaced by another 
temporally close component. For example, component Xn-9 is often replaced by Xn-i and Xn-w- Another interesting 
result is that criterion Ii for C — Z chooses the component j/„_5 for explaining x. Since the coupling is unidirectional 
this seems wrong. However, the two variables are functionally related for this coupling strength, and the inclusion 
of t/„_5 may be explained by a better noise-reduction. For j/, both criteria give a large variety of embedding vector 
forms. In fact for some cases the most observed embedding vector has a frequency near 10%. Criterion Iq is a bit 
more consistent but again fails to detect the coupling for C = 1. The Sx^y and Rx^y measures give the same 
pattern as for the noise- free case but take smaller values (see Figure HJJb)). 
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For the peculiar case of predicting x for C = 3 using the embedding scheme Ii, the value of Sy^x is 0.25 (not 
shown), which indicates significant coupling from y to x. This is not very worrying, because most information transfer 
measures tend to find such erroneous couplings when there is generalized synchronization. 

D. Unidirectionally coupled Mackey- Glass 

The last deterministic system in our simulation study is that of unidirectionally coupled identical Mackey-Glass 
equations [42] given by 

at) = iV:ittyo -o.Mt) 

■ (.. _ o.2y{t-A2) n i7/C/^ + r^iii^^i) 

For Ai, A2 = 17, 30 and 100, coupling strength C = [0, 0.05, 0.1, 0.15, 0.2, 0.3, 0.4, 0.5] and all possible combinations of 
driving-driven time series we study the embedding vector forms. Because the complexity of the individual uncoupled 
system differs significantly with the value of A (meaning Ai or A2), the choice of xf is of great importance, and we 
tried different 'Xp with regard to both the coupling strength and the systems under investigation. 
At first, a limited study of few realizations was performed for three choices of x^^, 

1. xf = (2/«+i), 

2. XF = (y„+i,y„+2,...,2/„+ri), 

3. XF = (j/„+l,y„+2,...,2/„+r2), 

where ri is the lag that gives the first minimum of mutual information and T2 the first maximum. These values were 
estimated for each coupling strength separately and we summarily present our conclusions. We observed that on the 
uncoupled systems where we have indications of the proper embedding dimension from the literature, the first choice 
of xf seriously underestimated it, especially for A = 100 where the embedding dimension was estimated as 2, when 
an appropriate value should be greater than the fractal dimension of about 7. Also, there were a lot of cases where 
despite the presence of moderate and strong coupling the embedding vectors for the driven system (y) did not have 
components from the driving one {x). This was also observed for the Rossler-Lorenz system (see Figure [6] for T — 1). 
The second choice gave reasonable embedding dimensions on the uncoupled systems, 3 for A = 17, 4 for A == 30 
and 9 for A = 100, while the third choice gave respectively 3, 5 and 10. The values of both these choices are in 
order, being always just greater than the respective fractal dimensions. For the uncoupled time series of A = 100 
the value of ri was 14 and for T2 it was 26 and both increased slightly for moderate coupling. These values are quite 
large and the estimation of mutual information on such multidimensional vectors including all the lags from to ti 
or T2 is tricky and computationally demanding. A nice and simple alternative that we opted to use ultimately was 
^F = iyn+i,yn+ri,yn+T2)- III t^is way, wc includc information further into the future and restrict xf to a three 
dimensional vector, which is efficient for mutual information estimation. 

Using this form of xf we performed 100 realizations for cross-modelling of x from x and y, and y from x and y. 
The time series length was set to -/V=4096, L^ — Ly = 50 and the Ii criterion was used. All combinations of Ai, A2 
and both coupling directions were examined and we marked the number of components from each system that were 
selected in the embedding vectors. The results for explaining y from x and y are shown in Fig. [71 where each line 
regards the number of components from each system that are used in forming the embedding vector. We see that for 
all combinations of Ai, A2, as coupling increases the number of components of the driven system reduces and this of 
the driving system increases, as we would expect. The embedding dimension for A2 — 100 is again underestimated, 
being 5 for the uncoupled case, but the conclusions regarding coupling direction and strength are sound. 

For the inverse direction, i.e. cross-modelling of x from x and y, the embedding vectors were comprised almost 
exclusively from components of x. One exception was for Ai = 17, A2 = 30 and C — 0.5, where coupling was detected 
in the wrong direction, from y to x. For such strong coupling there is generalized synchronization between the two 
systems and components from the two systems can be interchanged. 

The S and R measures gave qualitatively the same results as these in Figure[7|and therefore they are not presented. 

E. Coupled linear stochastic systems 

The proposed embedding scheme is based on entropies and makes no assumption for deterministic or nonlinear 
underlying dynamics, so in principle it can also be used in stochastic systems. Popular models for stochastic multi- 
variate time series are the vector autoregressive (VAR) models of order to, assigning equal number of delays to all p 
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FIG. 7: Number of components from each time series for mixed embedding vectors with regard to the coupling strength for 
the Mackey-Glass systems. Solid line denotes the driving system and dashed line the driven system. The panels are ordered 
for the systems indicated by Ai for the driving system and A2 for the driven system. The size of the markers indicates the 
frequency of occurrence of the most frequently selected embedding vectors (same size of the markers for each C value) . For 
example, the size of the marker in panel (e) for C = regards the largest frequency 100, whereas in the same panel for C = 0.2 
is the smallest frequency in these results equal to 42. 



variables, and the dynamic regression (DR) or distributed lag models that allow for different delays for each variable. 
Commonly the VAR and DR models are best known in their linear form [43, |4J] . 

We test the proposed embedding scheme in identifying the correct components present in the form of a linear DR 
model of order one in both variables 






0.5y„ 



Cx„ 



£2/ 



Only the second variable y depends on x and the coefficient C determines the strength of dependence. The random 
components ei^„ and e2,n are independent to x and y and to themselves for any lag and have mean zero and standard 
deviation one. The embedding vector forms and their frequencies for the embedding scheme Ii explaining x from x 
and y, and y from x and y, are shown in Table IVl for C — 0.0, 0.1, 0.2, 0.3, 0.4, 0.5 and two time series lengths N — 512 
and N — 4096 [A — 0.95, T = 1). For the direction of no driving effect, as well as for C = 0, the embedding scheme 
gives varying embedding vector forms when N = 512. When N — 4096, it identifies correctly the sole regressor 
in about half of the realizations, whereas for the rest realizations the model order is overestimated adding another 
component in the embedding vector from x or y. Overestimation of the model order is often observed by classical 
model order criteria, such as AIC and BIC, but for shorter time series [43, indicating that the proposed embedding 
scheme is more data demanding than other standard statistical techniques for model order selection. For any of the 
embedding vector forms, the absence of true driving effect is correctly identified by the S and R measures being zero. 
In the direction of driving effect, the correct embedding vector (a;„, 2/„) is found with a frequency that increases with 
C, slowly for N = 512 and rapidly for N = 4096. The measures S and R increase accordingly and the mean S and 
R get positive for C > 0.2. 

The coupling is detected similarly with the embedding scheme for a DR model of order one in the first variable and 
order two in the second variable 

Xn+l = 0.3.T„ - 0.5.T„_i + ei,„ 

yn+i = 0.4a;„ - 0.3?;„_i + £2,™ 
where ei_„ and 62, n are independent as above. The embedding vector for explaining x„+i is correctly identified as 
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TABLE V: Embedding vectors and their frequency of occurrence for the VAR system in two variables of order one for A'' — 512 
and TV = 4096. 



c 


AT = 512 


Ii (a;,y) -^ 


X 

N = 4096 


A = 512 


Ii (a;,y) -^ y 




A = 4096 





(a;„)9 
{x„,yn-i) 3 




(x„) 46 
(a;n,yn-3) 6 


(2/n)6 
(Sn-l.yn) 3 






(yn) 44 

(2;n-2, yn) 7 


0.1 


(Xn) 8 

{x„,yn-i) 4 




(x„) 42 

(Xn,yn) 7 


(k„,i/„) 6 

{Xn,yn,yn-3) 4 






(yn) 33 

(Sn,yn) 20 


0.2 


{x„) 12 

{Xn,Xn-4) 4 




{x„) 44 

{Xn,yn) 6 


{x„,y„) 13 

(a;n,yn,yn-2) 4 




(s^nij/n) 83 
{Xn,yn,yn-5) 3 


0.3 


(xn) 7 

{x„,X„-4,) 4 




(x„) 46 

(3::n,yn) 5 


(Xn,J/„) 27 

(a::„,a::„_2,?y„) 8 




{x„,y„) 95 
{x„,y„,yn-5) 1 


0.4 


(a;„,a;„_4) 4 

(Xn) 3 




(Xn) 46 

(x„,y„_3) 6 


{Xn,yn) 53 
(x„,X„_2,yn) 6 






{Xn,y„) 100 


0.5 


{x„)7 

{x„,X„-4) 4 




{Xn) 46 

(a::n,2/n-i) 7 


{Xn,yn) 74 

(xn, a::„_2,jy„) 6 






(2:„, y„) 100 



(a;„, a;„_i) in 12 out of 100 realizations when A'' = 512 and increases linearly with A'' (25 for N = 1204, 50 for A^ ~ 2048 
and 78 for A^ = 4096), where again the embedding vectors are augmented in the rest reahzations. For explaining j/„+i 
the correct embedding vector (xn,yn-i) is found in 14 realizations when N — 512 and increases again linearly with 
A'^. Thus the proposed embedding scheme, and subsequently the S and R measures, can detect the correct direction 
of coupling but require considerable data size. 



F. Significance of S and R measures 



Reliable detection of directed coupling requires that the coupling measure is examined for its significance. For 
example, a positive Rx~^y can be considered to indicate a driving effect of AT on y only if it exceeds a threshold of 
significance and, on the other hand, it should not exceed it if the driving effect is not present. In the lack of analytic 
null probability distribution of the coupling measure, i.e., distribution under the null hypothesis of no coupling, 
surrogate techniques have been proposed [12, |45(]. 

We use the simple technique of time shifted surrogates in [IJI, and for each randomly selected time index i (re- 
quiring i > 100) the surrogate couple of time series is as the original but the first time series is time shifted to 
(xi, Xi+i, . . . , Xn, Xi,X2, . . . , Xi^i). The embedding scheme on the surrogate bivariate time series when explaining y 
from X and y contains components from y, and components from x enter the embedding vector by chance and re- 
gardless of the coupling strength. Thus the measures Sx^y and Rx^y computed on surrogate bivariate time series 
are predominately zero and may get a positive value whenever a component from x enters the embedding scheme. 
Obviously the null distribution of Sx^y and Rx^y is not symmetric and the decision for the surrogate data test 
should be taken on the basis of rank ordering rather than z-score. 

We illustrate the correct significance and good power of the measures Sx-^y and Rx-^y from criterion Ii on the 
coupled Rossler-Lorenz system {A = 0.95, T = 3, N = 4096). In FigurelH we show the estimated probability (relative 
frequency from 100 realizations) of rejection of the null hypothesis of no coupling using rank ordering. The test is 
one-sided and if, say, the Sx^y value computed on the original pair of time series is larger than all Sx^y values 
computed on M = 40 surrogate pairs, the null hypothesis is rejected at the significance level a of about 0.02 (the 
p- value of the test at this case is 1 — (41 — 0.326)/(41 -I- 0.348) = 0.0163, using the corrected rank estimation in pa]). 
In Figure [5] the results are shown also for a « 0.05 (when the statistic for the original data set is second largest we 
have p = 0.0405). The graphs of the probability of rejection of no coupling vs coupling strength C are in agreement 
with the results on Sx->-y and Rx^y in Figure IH and show that though Sx^y and Rx^y do not reach maximum 
values for C = 1 the driving is detected with the same great confidence as for larger C. For the time series length 
A^ — 4096, the power of the surrogate data test is small when C — 0.5 as even the largest probability of rejection 
found for Rx^y is at 0.3. Higher statistical confidence of rejection can be achieved when A^ is doubled in agreement 
with positive Rx^y in Figure [5^. 
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FIG. 8: (Color online) Percentage of rejection vs coupling strength C for the surrogate data test of no causality from x \,o y 
for the coupled Rossler-Lorenz system (A = 0.95, T = 3, N = 4096). The measures 5* and R used as test statistics and the 
significance levels a are given in the legend. 

V. APPLICATION TO EEC 

We apply the non-uniform multivariate embedding procedure on EEG recordings of two epileptic patients, one with 
generalized tonic clonic seizure and one with partial complex seizure. The analysis was restricted to specific EEG 
channels and the goal was to study information transfer between anti-diametrical areas of the brain in the left and 
right hemisphere. Four channels are selected on each hemisphere and for each channel the average of its 4 nearest 
channels is subtracted. This transformation is called surface Laplacian estimation and it is considered to improve 
the spatial resolution of scalp potentials by reducing the common activities between neighboring electrodes, refining 
the data and removing possible artifacts (47|. The channels used are C3, T7, F3, and P3 on the left hemisphere and 
the corresponding C4, T8, F4 and P4 on the right hemisphere. The sampling time is 0.01 sec and the record lengths 
are approximately 5 hours for patient A and 4 hours for patient B. Taking the channels in pairs (each one with its 
antidiametrical) we split the time series into segments of length TV = 3000 data points (corresponding to 30 seconds 
duration) and apply the embedding scheme for mixed modelling. The threshold is v4 = 0.95, the maximum lags are 
chosen L = 20 and xp — Xn+i (this selection is justified by the vanishing of delayed MI after the first lags). For 
each left-right pair of channels we obtain embedding vectors for explaining each of the two channels. The number 
of components from each channel (left, right) in the mixed embedding vector for all 4 channel pairs and the two 
directions are given in Fig. [S] for patient A and Fig. [TU] for patient B. 
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FIG. 9: Number of components from each channel for mixed embedding vectors for patient A. Each row of 2 panels corresponds 
to a mixed embedding for explaining the signal of the channel indicated by the label on the left of the panels. Each left panel 
shows the number of components in the mixed embedding that are derived from the channel being explained and the right 
panel those from its antidiametrical. For example, the upper left panel has the number of components of C3 and the upper 
right those of C4 used in the embedding explaining C3, both as a function of time. The dashed vertical line shows the seizure 
onset. 
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FIG. 10: The same as Fig. |9]but for patient B. 



There are some common conclusions to be drawn for both patients. There seems to be httle or no information 
exchange between channels P3 and P4 as indicated by the embedding vectors having almost always no components 
from the antidiametrical channel (bottom panels in Fig. |9] and Fig. [TO)). For channel C4, the embedding vectors tend 
to have more components from both C3 and C4, than for channel C3 (more clearly for patient A), indicating possibly 
difference in the activity of these areas of the brain. A noteworthy observation is that soon after seizure onset there is 
no information exchange (or at least not strong enough to be detected by our approach) between the two hemispheres, 
and in all cases the mixed embedding vectors have no components from the antidiametrical channel (bottom parts on 
all panels on the right of the dashed line of seizure onset). This is more clearly visible in patient B where the record 
extends further after the seizure. This finding is in agreement with recent reports on resetting of the brain activity 
at the seizure end [43, |4^ . 

For patient A there is diversity in the embedding vectors with regard to the channels. The dimensions for channels 
C3 and P3 as well as C4 and P4 are similar in form, while for T7 and T8 there is a substantially higher number of 
components, again indicating difference in the activity of these areas of the brain. There does not seem to be any 
change in the embedding vector forms as the seizure onset approaches. In patient B we see that after seizure onset the 
dimension of the embedding vectors decreases dramatically, while right before the onset there is a slight increase in 
the number of components from both channels contributing to the embedding that explains channels C3, T7 and C4, 
T8 (Fig.[TUl top 4 panels). For channel F3, there is a large increase in the vector dimension in the time interval 70-25 
minutes before seizure and for channel F4 the embedding vectors have consistently, more or less, high dimensions. 
The respective profiles of S and R measures are in complete agreement with the profiles of the number of components 
of the mixed embedding, as shown for patient B in Figure [TT] (similarly for patient A, not shown). 

The difference in the results of corresponding channels between the two patients may be due to different seizure 
types. Patient A has generalized tonic clonic seizure and we expect the results from all channels to be more or less 
the same, while patient B has partial complex seizure and we expect the results to differ from channel to channel. 



VI. DISCUSSION 



The proposed method for reconstruction with a purpose works well in the tested systems. Our embedding scheme 
uses information measures in order to decide at each step which component to be included in the embedding vector. 
The estimation of mutual information on high-dimensional vector variables introduces large bias and variance. We 
used the estimator of iiT-nearest neighbor distances that is rather stable for high dimensions. Further, we found that 
the use of the conditional mutual information instead of the mutual information reduces the bias. The embedding 
scheme using the conditional mutual information turned out to be more efficient in selecting the embedding vector 
components. This was demonstrated in simulated chaotic systems of different complexity. 

The termination of the embedding procedure is done using a threshold criterion and we found that a strict threshold, 
such a& A = 0.95, may lead to under-embedding for continuous systems. We note however that for the application of 
information transfer detection a strict threshold is needed. If one uses a larger threshold value than 0.95, this would 
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FIG. 11: The R measure for the mixed embeddings of Fig. [10] for patient B. Rc4,^C3 is shown in top left panel and Rc3^C4, 
in top right panel. Similarly for the other panels, as indicated by the inset text. 

lead to quite large embedding dimensions and m.any components that contribute very little would be included. We 
found that this does not necessarily impair the accurate detection of coupling, but under specific data conditions, e.g. 
noisy data, it may lead to spurious coupling detection. 

The strict threshold and the progressive nature of our method guarantees that from all the components of the 
embedding vector, only few (if any) have small contribution, and this is what allows the clear detection of information 
transfer. If for example we have two time series from a coupled system with minimum embedding dimension mi and 
TO2, respectively, and the embedding vectors for cross-prediction are large enough to contain mi components from the 
first and TO2 from the second, then the sub-embedding used in the estimation of measures S and R would theoretically 
be equally good to the complete embedding, and the measures would not be able to clearly detect the coupling. The 
embedding that we propose here is of a minimum "adequate" dimension in the sense that it is sufficient to correctly 
detect interdependencies and free of redundant information that may confuse the applied measures. 

The proposed embedding scheme can be used for different purposes in univariate and multivariate time series 
analysis, such as invariant estimation, cross-prediction and mixed prediction. In particular, it is capable of detecting 
the coupling and measuring its strength, as demonstrated on different nonlinear deterministic systems, as well as 
stochastic linear systems, and on epileptic multi-channel EEC For the latter, we observed that though before the 
seizure the activity in one channel could be explained using a mixed embedding containing also components from the 
antidiametrical channel, right after the seizure there were no such components any more. Thus information fiow can 
be coarsely detected from the form of the embedding vector. 

For a quantitative indication of information flow, we developed the prediction measure S and the information 
measure R, both making use of the mixed and projected embedding vectors. These measures were able to detect 
different degrees of coupling, and they were found to have good significance and power using appropriate surrogate 
data testing. 

Our embedding scheme is certainly not restricted to bivariate time series and it is straightforward to apply it to 
more than two time series in order to investigate direct and indirect coupling among them. 
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